Bonus slides
Contents
sbatch callarr_job.sh
#!/usr/bin/bash
#SBATCH --array=1-100
#SBATCH --name=Hello-World
echo Hello I am job $SLURM_ARRAY_TASK_ID out of $SLURM_ARRAY_TASK_COUNTRun like this:
sbatch arr_job.sh
# also possible:
# - sbatch --array=1-100 anotherjob.sh
# - sbatch --array=23,42,8,15 anotherjob.sh
First, get a list of all files to process:
find /path -type f -name '*.bam' | sort > files.txt
Now, the job script:
#!/usr/bin/bash
n=$SLURM_ARRAY_TASK_ID
file_path=$(awk "(NR == $n)" files.txt)
md5sum $file_path >$file_path.md5Now, submit as:
wc -l files.txt
# OUTPUT: number of lines in files.txt
sbatch --array=1-$(wc -l files.txt) job_script.sh
# will run, e.g., 'sbatch --array=1-234' job_script.sh
threading library with low-level primitivesmultiprocessing? Processes?
map(func, list) -> list
func to each element on the list to obtain a new list of same sizeapply(func, list)
func to each element on the list, ignoring resultsN threadsmultiprocessing.Pool() (process pool ;-))func must be serializeable (top-level function!)parallel is a command line tool that allows you to
multiprocessing.ThreadPoolPlaceholders
{} - whole argument{/} - basename (filename) of argument{//} - dirname of argumentman parallel for more